It is not simple to decide how to address the multiple issues that can arise at begin of the first run through the assumptions of the Rasch model. There is also often not just one way to obtain a scale that fits. Usually, obtaining a good fitting scale by preserving as much as possible its original features should be prioritized.
Typically, articles report a) the fit of a scale at start analysis with focus on the results when testing each of the Rasch assumptions. Next, they report b) the fit at the end of the calibration process, once all breaches to the assumptions have been addressed. How to get from a) to b) the most efficiently is not written down anywhere, however some general advises are available.
One general recommendation is to start with the assumption of local item dependency (LID). Strong LID can go along with multidimensionality and also cause item thresholds to not work properly. Depending of the extent of LID, the aggregation into testlets can solve the multidimensionality. Once the questionnaire is free of LID and is found unidimensional, the interpretation of the reliability starts to make sense. Also, at this point the item fit statistics can be further investigated to make sure that all items or testlets work well to measure the construct.Finally, the analysis of DIF is undertaken. Depending on the purpose of the questionnaire it is worthwhile to gather information on how the items works for different person subgroups or different assessment situations and make sure that DIF is not indicating any unfair treatment of some subgroups.
The complexity of the analysis and the number of models undertaken can challenge the clarity of the reporting of a Rasch analysis.
Schematizations can be helpful.
Example PTGI (Kunz 2019) - Conceptualization of the analysis approach
Also, instead of each little adjustment step and test, only the fit statics at start and in the final Rasch model can be shown in an analysis summary table.
Example WHODAS 2.0 (Chiu 2019) - Summarizing
During the course following issues where found for the SRG-scale.
In the exercise of seminar 7, LID-continued, creating a testlet for SRG15 and SRG13 resulted in bad fit. Exceptionally, I would suggest to remove SRG15 from the scale. Usually, one would be cautious with the deletion of items, especially when testing the metric properties of a scale that is already established in the practice and research. When developing a new scale, when items are selected for a scale, the deletion of misfitting items is not as problematic.
Also, with regard to DIF, let’s assume that we want to come up with one metric for the entire SCI sample, and that the systematic differences between injury level subgroups in item SRG10 is not understood as a “favoritism” for one of the subgroup. In summary, let’s not split the item and keep just one difficulty estimate for SRG10.
First load the data but remove item SRG15:
urlfile = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2020_Rasch/master/Data/SRG_Data_Course_UNIZH.csv?token=AB5GB47UIUWV7F5NMGA33T27K5IQ2"
srg.data=read.csv(url(urlfile))
dim(srg.data)
## [1] 450 26
colnames(srg.data)
## [1] "X" "ID" "Age"
## [4] "Gender" "Completeness" "para.tetra_1"
## [7] "traumatic_nontraumatic" "PersStat" "SRG1"
## [10] "SRG2" "SRG3" "SRG4"
## [13] "SRG5" "SRG6" "SRG7"
## [16] "SRG8" "SRG9" "SRG10"
## [19] "SRG11" "SRG12" "SRG13"
## [22] "SRG14" "SRG15" "TP"
## [25] "ID_Unik" "wgt"
srg.items=c("SRG1", "SRG2", "SRG3", "SRG4", "SRG5", "SRG6", "SRG7", "SRG8", "SRG9", "SRG10", "SRG11", "SRG12", "SRG13", "SRG14") # minus "SRG15"
#create age groups based < 30 < 45 < 60 <
#the break choice here is, admittedly, a bit random
srg.data[, "Age_grp"] = cut(srg.data[, "Age"], breaks = c(0, 45, 60, 85))
# exogenous variables to test DIF for
srg.pf = c("Age", "Age_grp", "Gender", "Completeness", "para.tetra_1", "traumatic_nontraumatic")
# dataset with SRG items and the person factors
data.srg = srg.data[,c(srg.items, srg.pf)]
#check response coding (frequencies including missing values) for each SRG-item
#apply(data.srg, 2, table, useNA="always")
#recode traumatic_nontraumatic = 3 to NA
data.srg[which(data.srg[,"traumatic_nontraumatic"]==3), "traumatic_nontraumatic"] = NA
Now, the analysis will be run again without the item SRG15
library(eRm)
PCM.srg.2 = PCM(data.srg[,srg.items], sum0 = TRUE)
plotPImap(PCM.srg.2, sort = TRUE, main = "SRG-metric")
PP.srg.2 = person.parameter(PCM.srg.2)
SepRel(PP.srg.2)
Separation Reliability: 0.9028
itemfit(PP.srg.2)
Itemfit Statistics:
Chisq df p-value Outfit MSQ Infit MSQ Outfit t Infit t Discrim
SRG1 449.132 428 0.232 1.047 1.006 0.668 0.113 0.544
SRG2 439.557 428 0.339 1.025 1.028 0.252 0.432 0.542
SRG3 479.012 431 0.055 1.109 1.055 1.549 0.904 0.557
SRG4 469.749 430 0.090 1.090 1.131 1.190 2.079 0.521
SRG5 413.503 432 0.731 0.955 0.977 -0.527 -0.340 0.615
SRG6 317.565 430 1.000 0.737 0.769 -3.676 -4.115 0.721
SRG7 328.104 432 1.000 0.758 0.767 -3.389 -4.165 0.716
SRG8 378.689 432 0.969 0.875 0.936 -1.449 -1.025 0.605
SRG9 368.535 432 0.988 0.851 0.876 -2.272 -2.091 0.641
SRG10 383.925 431 0.950 0.889 0.900 -1.784 -1.670 0.603
SRG11 474.988 432 0.075 1.097 1.061 1.379 1.000 0.546
SRG12 389.044 431 0.927 0.901 0.937 -0.880 -0.933 0.584
SRG13 495.101 432 0.019 1.143 1.114 1.984 1.808 0.454
SRG14 363.940 431 0.992 0.842 0.861 -2.170 -2.373 0.675
resid_srg.2 = residuals(PP.srg.2)
urlfunction = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2020_Rasch/master/RFunctions/LIDGraph.R?token=AB5GB42GG6I6YI2MF3KDINC7K5JG6"
source(urlfunction)
LIDgraph(PCM.srg.2, cut = 0.2, vertex.color = "pink", vertex.size = 40,
vertex.label.dist = 0)
thres_map_fct = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2020_Rasch/master/RFunctions/threshold_map_fct.r"
source(url(thres_map_fct))
ThresholdMap(thresholds(PCM.srg.2))
Deleting of SRG15 resulted in :
In principle, once the scale has been calibrated with the Rasch model, a transformation table is created which links the rows scores of the final scale to the related-logit scored ability estimates and the logit scores to user-friendly rescaled scores. The range of the user-friendly score is typically from 0 to 100, which would allow to express scores in percentage of the maximum score that can be obtained. A transformed range from 0 to 100 makes only sense if the original range is already large. Like one would not rescale from 0 to 100 if the original instrument score range is very small. Otherwise, another convenient score range is selected.
library(scales)
names(PP.srg.2)
## [1] "X" "X01" "X.ex" "W" "model"
## [6] "loglik" "loglik.cml" "npar" "iter" "betapar"
## [11] "thetapar" "se.theta" "theta.table" "pred.list" "hessian"
## [16] "mpoints" "pers.ex" "gmemb"
T.Table = as.data.frame(cbind(PP.srg.2$pred.list[[1]]$x, PP.srg.2$pred.list[[1]]$y))
colnames(T.Table) = c("Row Score", "Logit Score")
#create a rescaled Rasch-Score in a convenient range, here from 0 to 100
Transformed_Score = scales::rescale(T.Table[,2], to = c(0, 100))
T.Table = cbind(T.Table, Transformed_Score)
colnames(T.Table) = c("Row Scores", "Logit Scores", "0-100 Scores")
#round to the second decimals of the two last columns
T.Table[,c(2,3)] = round(T.Table[, c(2,3)], 2)
T.Table
## Row Scores Logit Scores 0-100 Scores
## 1 0 -4.34 0.00
## 2 1 -3.48 9.33
## 3 2 -2.69 17.90
## 4 3 -2.19 23.30
## 5 4 -1.82 27.40
## 6 5 -1.51 30.78
## 7 6 -1.24 33.71
## 8 7 -1.00 36.33
## 9 8 -0.77 38.73
## 10 9 -0.57 40.97
## 11 10 -0.37 43.10
## 12 11 -0.18 45.14
## 13 12 0.00 47.12
## 14 13 0.18 49.05
## 15 14 0.35 50.96
## 16 15 0.53 52.87
## 17 16 0.70 54.78
## 18 17 0.88 56.73
## 19 18 1.07 58.72
## 20 19 1.26 60.78
## 21 20 1.46 62.94
## 22 21 1.67 65.24
## 23 22 1.90 67.74
## 24 23 2.15 70.52
## 25 24 2.45 73.72
## 26 25 2.80 77.60
## 27 26 3.28 82.74
## 28 27 4.04 91.00
## 29 28 4.87 100.00
An interesting application for Rasch (and IRT) derived parameter is found in computer adaptive testing. Computer-based testing is a broad field which includes linear and adaptive testing.
In linear testing, the same number of test questions are administered in a same order to all respondents. Linear testing is similar to a standard paper-based test. Additionally, to paper-pencil, the computer can immediately process the responses and compute the respondents’ score.
Adaptive testing is a type of testing where the scale adjusts to the ability of the respondent. The questions that a respondent receives are selected based on the past responses. In that sense, the test adapts to the response pattern and the ability of the respondent. The goal of the CAT is to select items that reduce the standard error of measurement and help to obtain a stable estimate of the ability. Typically, when the ability estimate varies within a small margin of error, the test can be stopped. Computer-adaptive testing offers several advantages such as shortening the time for test delivery and immediate score reporting to candidates.
Boston University: School of Public Health
In R, the package mirtCAT from the same authors as mirt allows adaptive testing, using the item parameter issued from mirt or any other IRT-software (manual entry of the difficulty parameter). The package mirtCAT can also be used for multidimensional testing. While other packages for CAT would be available in R, mirtCAT allows to generate a user-friendly interface to administer a CAT.
Building an interface for a CAT requires:
start_item =).criteria =)method =)design = list())The values that these settings can take can be found when typing ?mirtCAT.